Conversation
* prompt coniguration backend to be testing * custom prompt configuration update and fixed Pyright issues * fixed copilot reviews * pre validation step added when user query is inserted * added more validation cases * fixed review comments * added spec document to newly updated tool classification
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sync changes from buerokratt/LLM-Module wip branch
* prompt coniguration backend to be testing * custom prompt configuration update and fixed Pyright issues * fixed copilot reviews * pre validation step added when user query is inserted * added more validation cases * fixed review comments * feat: add react-markdown and remark-gfm for rendering markdown content in TestModel page --------- Co-authored-by: nuwangeek <charith.bimsara@rootcode.io> Co-authored-by: Charith Nuwan Bimsara <59943919+nuwangeek@users.noreply.github.com> Co-authored-by: erangi-ar <erangika.ariyasena@rootcode.io>
Show the response in markdown in test LLM page (buerokratt#317)
* refactor: update SSE connection URL to use environment variable * foramt markdown of the llm response * feat: add markdown support to MessageContent component * title fix * prompt coniguration backend to be testing * custom prompt configuration update and fixed Pyright issues * fixed copilot reviews * pre validation step added when user query is inserted * added more validation cases * fixed review comments * resolved pr comments --------- Co-authored-by: erangi-ar <erangika.ariyasena@rootcode.io> Co-authored-by: nuwangeek <charith.bimsara@rootcode.io> Co-authored-by: Charith Nuwan Bimsara <59943919+nuwangeek@users.noreply.github.com> Co-authored-by: Thiru Dinesh <thiru.dinesh@rootcodelabs.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: Thiru Dinesh <56014038+Thirunayan22@users.noreply.github.com>
…eton with BaseWorkflow abstract class (buerokratt#318) * prompt coniguration backend to be testing * custom prompt configuration update and fixed Pyright issues * fixed copilot reviews * pre validation step added when user query is inserted * added more validation cases * fixed review comments * implement tool classification orchestration agent skeleton * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fixed copilot suggested changes * fixed issue * added skills * fixed issue --------- Co-authored-by: Thiru Dinesh <56014038+Thirunayan22@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Co-authored-by: Thiru Dinesh <thiru.dinesh@rootcodelabs.com>
Sync wip branches
RAG System Security Assessment ReportRed Team Testing with DeepTeam Framework Executive SummarySystem Security Status: VULNERABLE Overall Pass Rate: 0.0% Risk Level: HIGH Attack Vector Analysis
Only tested attack categories are shown above. Vulnerability Assessment
Multilingual Security Analysis
Failed Security Tests Analysis
(2 additional failures not shown) Security RecommendationsPriority Actions RequiredCritical Vulnerabilities (Immediate Action Required):
Attack Vector Improvements:
Specific Technical Recommendations:
General Security Enhancements:
Testing MethodologyThis security assessment used DeepTeam, an advanced AI red teaming framework that simulates real-world adversarial attacks. Test Execution Process
Attack Categories TestedSingle-Turn Attacks:
Multi-Turn Attacks:
Vulnerabilities Assessed
Language SupportTests were conducted across multiple languages:
Pass/Fail Criteria
Report generated on 2026-02-20 10:35:55 by DeepTeam automated red teaming pipeline |
RAG System Evaluation ReportDeepEval Test Results Summary
Total Tests: 20 | Passed: 0 | Failed: 20 Detailed Test Results| Test | Language | Category | CP | CR | CRel | AR | Faith | Status | Legend: CP = Contextual Precision, CR = Contextual Recall, CRel = Contextual Relevancy, AR = Answer Relevancy, Faith = Faithfulness Failed Test Analysis
(90 additional failures not shown) RecommendationsContextual Precision (Score: 0.000): Consider improving your reranking model or adjusting reranking parameters to better prioritize relevant documents. Contextual Recall (Score: 0.000): Review your embedding model choice and vector search parameters. Consider domain-specific embeddings. Contextual Relevancy (Score: 0.000): Optimize chunk size and top-K retrieval parameters to reduce noise in retrieved contexts. Answer Relevancy (Score: 0.000): Review your prompt template and LLM parameters to improve response relevance to the input query. Faithfulness (Score: 0.000): Strengthen hallucination detection and ensure the LLM stays grounded in the provided context. Report generated on 2026-02-20 10:36:04 by DeepEval automated testing pipeline |
No description provided.